Ismailia
Investigating Cultural Alignment of Large Language Models
AlKhamissi, Badr, ElNokrashy, Muhammad, AlKhamissi, Mai, Diab, Mona
The intricate relationship between language and culture has long been a subject of exploration within the realm of linguistic anthropology. Large Language Models (LLMs), promoted as repositories of collective human knowledge, raise a pivotal question: do these models genuinely encapsulate the diverse knowledge adopted by different cultures? Our study reveals that these models demonstrate greater cultural alignment along two dimensions -- firstly, when prompted with the dominant language of a specific culture, and secondly, when pretrained with a refined mixture of languages employed by that culture. We quantify cultural alignment by simulating sociological surveys, comparing model responses to those of actual survey participants as references. Specifically, we replicate a survey conducted in various regions of Egypt and the United States through prompting LLMs with different pretraining data mixtures in both Arabic and English with the personas of the real respondents and the survey questions. Further analysis reveals that misalignment becomes more pronounced for underrepresented personas and for culturally sensitive topics, such as those probing social values. Finally, we introduce Anthropological Prompting, a novel method leveraging anthropological reasoning to enhance cultural alignment. Our study emphasizes the necessity for a more balanced multilingual pretraining dataset to better represent the diversity of human experience and the plurality of different cultures with many implications on the topic of cross-lingual transfer.
Inexact Unlearning Needs More Careful Evaluations to Avoid a False Sense of Privacy
Hayes, Jamie, Shumailov, Ilia, Triantafillou, Eleni, Khalifa, Amr, Papernot, Nicolas
The high cost of model training makes it increasingly desirable to develop techniques for unlearning. These techniques seek to remove the influence of a training example without having to retrain the model from scratch. Intuitively, once a model has unlearned, an adversary that interacts with the model should no longer be able to tell whether the unlearned example was included in the model's training set or not. In the privacy literature, this is known as membership inference. In this work, we discuss adaptations of Membership Inference Attacks (MIAs) to the setting of unlearning (leading to their "U-MIA" counterparts). We propose a categorization of existing U-MIAs into "population U-MIAs", where the same attacker is instantiated for all examples, and "per-example U-MIAs", where a dedicated attacker is instantiated for each example. We show that the latter category, wherein the attacker tailors its membership prediction to each example under attack, is significantly stronger. Indeed, our results show that the commonly used U-MIAs in the unlearning literature overestimate the privacy protection afforded by existing unlearning techniques on both vision and language models. Our investigation reveals a large variance in the vulnerability of different examples to per-example U-MIAs. In fact, several unlearning algorithms lead to a reduced vulnerability for some, but not all, examples that we wish to unlearn, at the expense of increasing it for other examples. Notably, we find that the privacy protection for the remaining training examples may worsen as a consequence of unlearning. We also discuss the fundamental difficulty of equally protecting all examples using existing unlearning schemes, due to the different rates at which examples are unlearned. We demonstrate that naive attempts at tailoring unlearning stopping criteria to different examples fail to alleviate these issues.
Iran appears to have struck ship off Indian coast with UAV: US Official
Former CENTCOM Spokesperson and retired U.S. Army Colonel Joe Buccino discusses Iran's involvement in Houthi attacks and the U.S.' approach to deterrence and response. Iran appears to have struck a ship off the Indian coast with an unmanned aerial vehicle, a U.S. official told Fox News on Saturday. It comes as Houthi militants targeted multiple cargo ships on Saturday, as the group fired two anti-ship ballistic missiles into international shipping lanes located in the Southern Red Sea, according to U.S. Central Command. No ships were impacted by the ballistic missiles, officials said. The USS Laboon shot down four unmanned aerial drones on Saturday which originated from areas that the Houthis control in Yemen.
COVID-19 Status Forecasting Using New Viral variants and Vaccination Effectiveness Models
Rashed, Essam A., Kodera, Sachiko, Hirata, Akimasa
Background: Recently, a high number of daily positive COVID-19 cases have been reported in regions with relatively high vaccination rates; hence, booster vaccination has become necessary. In addition, infections caused by the different variants and correlated factors have not been discussed in depth. With large variabilities and different co-factors, it is difficult to use conventional mathematical models to forecast the incidence of COVID-19. Methods: Machine learning based on long short-term memory was applied to forecasting the time series of new daily positive cases (DPC), serious cases, hospitalized cases, and deaths. Data acquired from regions with high rates of vaccination, such as Israel, were blended with the current data of other regions in Japan to factor in the potential effects of vaccination. The protection provided by symptomatic infection was also considered in terms of the population effectiveness of vaccination as well as the waning protection and ratio and infectivity of viral variants. To represent changes in public behavior, public mobility and interactions through social media were also included in the analysis. Findings: Comparing the observed and estimated new DPC in Tel Aviv, Israel, the parameters characterizing vaccination effectiveness and the waning protection from infection were well estimated; the vaccination effectiveness of the second dose after 5 months and the third dose after two weeks from infection by the delta variant were 0.24 and 0.95, respectively. Using the extracted parameters regarding vaccination effectiveness, new cases in three prefectures of Japan were replicated.
Knowledge discovery from emergency ambulance dispatch during COVID-19: A case study of Nagoya City, Japan
Rashed, Essam A., Kodera, Sachiko, Shirakami, Hidenobu, Kawaguchi, Ryotetsu, Watanabe, Kazuhiro, Hirata, Akimasa
Accurate forecasting of medical service requirements is an important big data problem that is crucial for resource management in critical times such as natural disasters and pandemics. With the global spread of coronavirus disease 2019 (COVID-19), several concerns have been raised regarding the ability of medical systems to handle sudden changes in the daily routines of healthcare providers. One significant problem is the management of ambulance dispatch and control during a pandemic. To help address this problem, we first analyze ambulance dispatch data records from April 2014 to August 2020 for Nagoya City, Japan. Significant changes were observed in the data during the pandemic, including the state of emergency (SoE) declared across Japan. In this study, we propose a deep learning framework based on recurrent neural networks to estimate the number of emergency ambulance dispatches (EADs) during a SoE. The fusion of data includes environmental factors, the localization data of mobile phone users, and the past history of EADs, thereby providing a general framework for knowledge discovery and better resource management. The results indicate that the proposed blend of training data can be used efficiently in a real-world estimation of EAD requirements during periods of high uncertainties such as pandemics.
NADI 2020: The First Nuanced Arabic Dialect Identification Shared Task
Abdul-Mageed, Muhammad, Zhang, Chiyu, Bouamor, Houda, Habash, Nizar
We present the results and findings of the First Nuanced Arabic Dialect Identification Shared Task (NADI). This Shared Task includes two subtasks: country-level dialect identification (Subtask 1) and province-level sub-dialect identification (Subtask 2). The data for the shared task covers a total of 100 provinces from 21 Arab countries and are collected from the Twitter domain. As such, NADI is the first shared task to target naturally-occurring fine-grained dialectal text at the sub-country level. A total of 61 teams from 25 countries registered to participate in the tasks, thus reflecting the interest of the community in this area. We received 47 submissions for Subtask 1 from 18 teams and 9 submissions for Subtask 2 from 9 teams.
Development of accurate human head models for personalized electromagnetic dosimetry using deep learning
Rashed, Essam A., Gomez-Tames, Jose, Hirata, Akimasa
The development of personalized human head models from medical images has become an important topic in the electromagnetic dosimetry field, including the optimization of electrostimulation, safety assessments, etc. Human head models are commonly generated via the segmentation of magnetic resonance images into different anatomical tissues. This process is time consuming and requires special experience for segmenting a relatively large number of tissues. Thus, it is challenging to accurately compute the electric field in different specific brain regions. Recently, deep learning has been applied for the segmentation of the human brain. However, most studies have focused on the segmentation of brain tissue only and little attention has been paid to other tissues, which are considerably important for electromagnetic dosimetry. In this study, we propose a new architecture for a convolutional neural network, named ForkNet, to perform the segmentation of whole human head structures, which is essential for evaluating the electrical field distribution in the brain. The proposed network can be used to generate personalized head models and applied for the evaluation of the electric field in the brain during transcranial magnetic stimulation. Our computational results indicate that the head models generated using the proposed network exhibit strong matching with those created via manual segmentation in an intra-scanner segmentation task.
Learning-based estimation of dielectric properties and tissue density in head models for personalized radio-frequency dosimetry
Rashed, Essam A., Diao, Yinliang, Hirata, Akimasa
Radio-frequency dosimetry is an important process in human safety and for compliance of related products. Recently, computational human models generated from medical images have often been used for such assessment, especially to consider the inter-variability of subjects. However, the common procedure to develop personalized models is time consuming because it involves excessive segmentation of several components that represent different biological tissues, which limits the inter-variability assessment of radiation safety based on personalized dosimetry. Deep learning methods have been shown to be a powerful approach for pattern recognition and signal analysis. Convolutional neural networks with deep architecture are proven robust for feature extraction and image mapping in several biomedical applications. In this study, we develop a learning-based approach for fast and accurate estimation of the dielectric properties and density of tissues directly from magnetic resonance images in a single shot. The smooth distribution of the dielectric properties in head models, which is realized using a process without tissue segmentation, improves the smoothness of the specific absorption rate (SAR) distribution compared with that in the commonly used procedure. The estimated SAR distributions, as well as that averaged over 10-g of tissue in a cubic shape, are found to be highly consistent with those computed using the conventional methods that employ segmentation.
Non-Uniform Conductivity Estimation for Personalized Brain Stimulation using Deep Learning
Rashed, Essam A., Gomez-Tames, Jose, Hirata, Akimasa
--Electromagnetic stimulation of the human brain is a key tool for the neurophysiological characterization and diagnosis of several neurological disorders. Transcranial magnetic stimulation (TMS) is one procedure that is commonly used clinically. However, personalized TMS requires a pipeline for accurate head model generation to provide target-specific stimulation. This process includes intensive segmentation of several head tissues based on magnetic resonance imaging (MRI), which has significant potential for segmentation error, especially for low-contrast tissues. Additionally, a uniform electrical conductivity is assigned to each tissue in the model, which is an unrealistic assumption based on conventional volume conductor modeling. This paper proposes a novel approach to the automatic estimation of electric conductivity in the human head for volume conductor models without anatomical segmentation. A convolutional neural network is designed to estimate personalized electrical conductivity values based on anatomical information obtained from T1-and T2-weighted MRI scans. This approach can avoid the time-consuming process of tissue segmentation and maximize the advantages of position-dependent conductivity assignment based on water content values estimated from MRI intensity values. The computational results of the proposed approach provide similar but smoother electric field results for the brain when compared to conventional approaches. In electromagnetic dosimetry applications, the use of computational models that imitate human anatomy is an essential process [1].
Sheep identity recognition, age and weight estimation datasets
Abdelhady, Aya Salama, Hassanenin, Aboul Ella, Fahmy, Aly
Increased interest of scientists, producers and consumers in sheep identification has been stimulated by the dramatic increase in population and the urge to increase productivity. The world population is expected to exceed 9.6 million in 2050. For this reason, awareness is raised towards the necessity of effective livestock production. Sheep is considered as one of the main of food resources. Most of the research now is directed towards developing real time applications that facilitate sheep identification for breed management and gathering related information like weight and age. Weight and age are key matrices in assessing the effectiveness of production. For this reason, visual analysis proved recently its significant success over other approaches. Visual analysis techniques need enough images for testing and study completion. For this reason, collecting sheep images database is a vital step to fulfill such objective. We provide here datasets for testing and comparing such algorithms which are under development. Our collected dataset consists of 416 color images for different features of sheep in different postures. Images were collected fifty two sheep at a range of year from three months to six years. For each sheep, two images were captured for both sides of the body, two images for both sides of the face, one image from the top view, one image for the hip and one image for the teeth. The collected images cover different illumination, quality levels and angle of rotation. The allocated data set can be used to test sheep identification, weigh estimation, and age detection algorithms. Such algorithms are crucial for disease management, animal assessment and ownership.